Philosophy and Definition for a Universal Genetic Sequence Database

نویسنده

  • Thomas D. Schneider
چکیده

Modern sequence databases have many problems because they are not carefully defined. For example, when one searches for homology with a given sequence, many duplicate sequences are found in each database and similar but not identical results are obtained from other databases. The extra copies of inconsistent information are slowing down research. This situation could easily be avoided by removing redundancy in the databases, but this goal is not a fundamental component of the database design and has been neglected. A clear statement of goals for storing genetic sequence information is required. To this end, five documents, in decreasing order of importance, are proposed: (1) The Philosophy document defines guiding principles for the design and use of the database. (2) The Definition document identifies what is to be stored in the database, following the guidance of the philosophy. It is machine and computer-language independent. (3) The Implementation document translates the definition into computer code which can be run on machines. (4) Examples of all database objects allow users to test their database analysis programs. (5) A Tutorial explains how the database is organized. A philosophy and the beginnings of a definition are given in this paper. Individual researchers can help by: 1. providing correct and complete sequence data; 2. annotating their sequences with experimentally deduced features; 3. providing standard genetic names for all features; 4. updating the annotation as new knowledge is obtained; 5. periodically verifying the accuracy of the data and annotation in the public ∗National Institutes of Health National Cancer Institute at Frederick, Gene Regulation and Chromosome Biology Laboratory, P. O. Box B, Frederick, MD 21702-1201. (301) 846-5581, email: [email protected] http://alum.mit.edu/www/toms/

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

iProsite: an improved prosite database achieved by replacing ambiguous positions with more informative representations

PROSITE database contains a set of entries corresponding to protein families, which are used to identify the family of a protein from its sequence. Although patterns and profiles are developed to be very selective, each may have false positive or negative hits. Considering false positives as items that reduce the selectiveness of a pattern, then, the more selective pattern we have, a more accur...

متن کامل

Isolation, Cloning and Sequence Analysis of 1-Aminocyclopropane-1-Carboxylate Deaminase Gene from Native Sinorhizobium meliloti

Background: Many plant growth-promoting bacteria including Rhizobia contain the 1-aminocyclopropane-1-carboxylate (ACC) deaminase enzyme that can leave ACC, and thereby lower the level of ethylene in stressed plants. Drought and salinity are the most common environmental stress factors for plants in Iran. Objectives: The main aim of this research was development of bio-fertilizers containing A...

متن کامل

A common philosophy and FORTRAN 77 software package for implementing and searching sequence databases

I present a common philosophy for implementing the EMBL and GENBANK (BBN-Los Alamos) nucleic acid sequence databases, as well as the National Biological Foundation (Dayhoff) protein sequence database. The associated FORTRAN 77 fully transportable software package includes: 1) modules for implementing each of these databases from the initial magnetic tape file, 2) modules performing a fast mnemo...

متن کامل

A new philosophy of man and humanism

The theoretical basis for the new philosophy was laid by the American philosopher James Joseph Dagenais (1923-1981), who came to the conclusion that philosophical anthropology is not a science, but a domain unto itself, and that a philosophy of man can only come about as a joint undertaking of all sciences, in which the object of study must be man himself. The final explanation of man lies outs...

متن کامل

GENETIC AND TABU SEARCH ALGORITHMS FOR THE SINGLE MACHINE SCHEDULING PROBLEM WITH SEQUENCE-DEPENDENT SET-UP TIMES AND DETERIORATING JOBS

 This paper introduces the effects of job deterioration and sequence dependent set- up time in a single machine scheduling problem. The considered optimization criterion is the minimization of the makespan (Cmax). For this purpose, after formulating the mathematical model, genetic and tabu search algorithms were developed for the problem. Since population diversity is a very important issue in ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011